Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS

Target-aware drug discovery has greatly accelerated the drug discovery process to design small-molecule ligands with high binding affinity to disease-related protein targets. Conditioned on targeted proteins, previous works utilize various kinds of deep generative models and have shown great potential in generating molecules with strong protein-ligand binding interactions. However, beyond binding affinity, effective drug molecules must manifest other essential properties such as high drug-likeness, which are not explicitly addressed by current target-aware generative methods. In this article, aiming to bridge the gap of multi-objective target-aware molecule generation in the field of deep learning-based drug discovery, we propose ParetoDrug, a Pareto Monte Carlo Tree Search (MCTS) generation algorithm. ParetoDrug searches molecules on the Pareto Front in chemical space using MCTS to enable synchronous optimization of multiple properties. Specifically, ParetoDrug utilizes pretrained atom-by-atom autoregressive generative models for the exploration guidance to desired molecules during MCTS searching. Besides, when selecting the next atom symbol, a scheme named ParetoPUCT is proposed to balance exploration and exploitation. Benchmark experiments and case studies demonstrate that ParetoDrug is highly effective in traversing the large and complex chemical space to discover novel compounds with satisfactory binding affinities and drug-like properties for various multi-objective target-aware drug discovery tasks.

the iteration time from 50 to 150 to ensure the same computational budgets as Pare-toDrug.The hyperparameters of ParetoDrug are available in the code repository.The iteration time of the Pareto MCTS part in ParetoDrug is also set at 150.

C. Comparison of Computational Efficacy
We compare the computational efficacy of the RL or MCTS-based methods including AlphaDrug, REINVENT 4, and ParetoDrug with the same hard computational resources including 1 NVIDIA GPU and 8 CPU cores.We use the first protein target (PDB ID: 1A9U) in the benchmark datasets as the test case.We run all methods 3 trials and calculate the average time.The results are shown in Supplementary Table 1.Note that the time for each molecule counts the computationally expensive docking time.We can see the computational efficacy of ParetoDrug is similar to AlphaDrug but slower than the well-developed molecular design tool REINVENT 4. The reason may be that AlphaDrug and ParetoDrug use the additional pretrained model (i.e.Lmser Transformer) to guide molecule generation.But the pretrained model is necessary to help generate molecules with higher binding affinities.

D. Score Distributions of A Given Target
We also plot the generated molecules for a given target (PDB ID: 1A9U) by running AlphaDrug, ParetoDrug, and REINVENT 4. The running statistics are given in Appendix and the molecule score distributions are plotted in Supplementary Figure 1.Note that we plot the top 500 molecules (no repeated molecules) with the highest Docking scores collected in three trials for each method to reflect the score distribution of the algorithms.We can see that AlphaDrug has the best Docking score distribution but its LogP and QED distributions are in terrible situations.For example, as shown in Supplementary Figure 1, almost all the molecules generated by AlphaDrug have a QED value of less than 0.

E. Diversity of Generated Molecules
In this section, we study the molecule diversity of ParetoDrug.AlphaDrug and Pare-toDrug perform MCTS with guidance from the pretrained generative model.The two algorithms explore during the MCTS by the selection randomness, which brings the molecule diversity.Specifically, the diversity of ParetoDrug is achieved by the selection randomness, which could be divided into two parts as indicated by Eq. ( 6) in the article.First, the pretrained model would provide the next-atom distribution based on the current inputting molecule fragment.This brings the randomness for generating molecules.This randomness is also for AlphaDrug.Second, when selecting the next node, ParetoDrug selects randomly from the Pareto-dominate candidate set.As there is no dominant molecule in most cases, different molecules would be generated for evaluation.This randomness is not for AlphaDrug.We also perform experiments to examine the molecule diversity of ParetoDrug.We give the statistics of running 5 trials on the same target with AlphaDrug and ParetoDrug.The diversity ratio is calculated as the number of unique molecules (do not appear in the other four trials) in this trial divided by the number of molecules in the same trial.Results are shown in Supplementary Table 2. From Supplementary

F. Protein-ligand Interaction Analysis
For the analysis of the protein-ligand interactions, we use PLIP 6 , a fully automated protein-ligand interaction web tool.We mainly analyze two kinds of interactions, the hydrogen bond and the π-π stacking.The hydrogen bond is an attractive interaction between a hydrogen atom from a molecule or a molecular fragment X-H in which X is more electronegative than H, and an atom or a group of atoms in the same or another molecule, in which there is evidence of a bond formation 7 .The π-π stacking refers to the interaction between aromatic rings governed by an interplay of electrostatic, van-der-Waals, and hydrophobic interactions 8 .Stable arrangements of aromatic rings are either in parallel (sandwich) or a perpendicular (T-shaped) orientation 9 .Both the hydrogen bond and π-π stacking are recognized as the most important factors of the stabilization mechanism in the protein-ligand interaction analysis.

G. Settings of MM-GBSA
MM-GBSA is the end-point approach employing molecular mechanics, the generalized Born model, and solvent accessible surface area method to estimate binding free energies from structural information, overcoming the computational complexities associated with molecular simulations.Within the MM-GBSA approach, parameters are defined under the additivity approximation, treating the free energy change as a sum of various physical energy components.We performed molecular dynamics simulations for all protein-ligand complexes using pmemd.cuda in the AMBER22 package 10 with the ff19SB 11 and gaff (version 1.81) 12 force fields respectively for the proteins and ligands.The HIS residue was protonated at the epsilon position.Partial charges of ligands are calculated using the AM1-BCC model 13 .The protein-ligand structures were solvated in the TIP3P 14 water box and the distance between the solute and the edge of the box was at least 10.0 Å. Na+ or Cl-ions were added to neutralize the unbalanced charge.The positions of added waters do not take into account the presence of the solute or ions.For more information, please refer to Supplementary Table 3.To reduce these bad contacts, three energy minimization steps were preformed to create a stable system for the next production simulations.In the first step, the energy of water and ions was minimized and the protein backbone was restrained with an elastic constant of 50 kcal • mol −1 • Å−2 .In the second step, the elastic constant decreased to 10 kcal•mol −1 • Å−2 .At last, the whole system was minimized without any restraint.The first 2000 cycles utilize the steepest descent algorithm before shifting to the conjugate gradient algorithm for the remaining 3000 cycles for every minimization.The cutoff for the non-bonded interactions (van der Waals and electrostatic interactions) was set to 10.0 Å.The particle mesh Ewald (PME) 15 algorithm was employed to estimate the long-range electrostatic interactions with the periodic boundary condition (PBC).
Then, during 50 ps, the minimized system was heated from 0 to 300 K in the NVT ensemble, followed by the 100-ps NPT equilibration runs at 1 atm pressure with weak restraints of 2.0 kcal • mol −1 • Å−2 on the heavy atoms of the proteins and 500ps NPT equilibration runs without any restraint.In the last production runs, the conformations of protein-ligand complexes were sampled using 100 ns NPT molecular dynamics at 300 K with the Langevin thermostat and 1 atm pressure.The SHAKE 16 algorithm was used to constrain the covalent bonds involving hydrogen atoms.The time step was set to 2 fs and 20,000 snapshots were saved for every complex.For each system, three independent simulations were performed from the docked pose.
Time-course analysis of root mean square deviations (RMSDs) and Root mean square fluctuations (RMSFs) for all molecular simulations (shown in Supplementary Figures 2-5) was done using the cpptraj (version 4.14.0)module in AMBER18.
We uniformly extract 100 samples from the last 10,000 snapshots for MM-GBSA binding free energy calculations (shown in Supplementary Tables 4-7).Note that, in this study, a newer variant of MM-GBSA with the variable dielectric constant model implemented in a modified version of MMPBSA.py in AMBER18 was used 17 .The key idea is to assign the alternative dielectric value for atoms of the solute in the polar solvation free energies and electrostatic interactions described as and where q i is the charge on the atom i within the protein, and q j is the charge within the atom j of the ligand, and r ij is the distance between the atom i and atom j.The symbols ε in and ε sol correspond to the dielectric constants of the solute and the solvent (usually water) respectively.Meanwhile, α ij is defined as the geometric average of the Born radii α i and α j .In the variable dielectric model, different dielectric constant ε in is assigned to the residue type k, which interacts with the ligand.
Supplementary Table 3: The molecular dynamics condition for the complex of each compound and its protein target.
distributions of generated molecules by AlphaDrug, ParetoDrug, and REINVENT 4 for a given target (PDB ID: 1A9U) with n = 500 molecules.(A) The docking score (kcal • mol −1 ) distributions of each method.(B) The LogP value distributions of each method.(C) The QED value distributions of each method.(D) The SA score distributions of each method.(E) The NP-likeness distributions of each method.the score distributions indicate that ParetoDrug is exploring different chemical spaces with AlphaDrug although they utilize the same pretrained generative model.

Supplementary Figure 2 :
Time course of RMSDs and RMSFs for the complexes of molecules binding to the protein target FXR (PDB ID: 7D42).(A) RMSD (left) and RMSF (right) of Tropifexor binding to FXR. (B) RMSD and RMSF of Compound 1 binding to FXR. (C) RMSD and RMSF of Compound 2 binding to FXR. (D) RMSD and RMSF of Compound 3 binding to FXR. (E) RMSD and RMSF of Compound 4 binding to FXR.

Table 1 :
Comparison of computational efficacy of RL or MCTS-based methods with n = 3 trails.The unit of Average Time and Time Per Molecule is second.The 95% confidence interval of mean values is given.

Table 2 ,
we found that most molecules generated in trial 2 and trial 3 of AlphaDrug are the same.ParetoDrug could have a better diversity ratio than AlphaDrug by the second selection randomness based on the Pareto-dominate candidate set.This is a desired property for molecule generation as we always want more new molecule candidates with good properties.

Table 2 :
Diversity Ratio for Molecules in Each Trial.

Table 4 :
The energy terms in MM-GBSA for the complex of each compound and the protein target FXR (PDB ID: 5G2N).